Fast Approximate String Matching in a Dictionary
نویسندگان
چکیده
A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that on-line search, we index the vocabulary as a metric space. We show that with reasonable space overhead we can improve by a factor of two over the fastest online algorithms , when the tolerated error level is low (which is reasonable in text searching).
منابع مشابه
Eecient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm
A key approach in string processing algorithmics has been the labeling paradigm KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the rst optimal parallel algorithm for suux tree construction was given in SV94], the labeling paradigm was considered not to be compe...
متن کاملApproximate String Matching: Theory and Applications (La Recherche Approchée de Motifs : Théorie et Applications)
The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows : Let D = {x1, x2, . . . xd} be a set of d words defined on an alphabet Σ, let q be a query defined also on Σ, and let k be a positive integer. We want to build a data structure on D capable of answering the following query : find all words i...
متن کاملApproximate string matching algorithms for limited-vocabulary OCR output correction
Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian a...
متن کاملA Fast Algorithm for Approximate String Matching on Gene Sequences
Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch p...
متن کاملFast approximate string matching with finite automata
We present a fast algorithm for finding approximate matches of a string in a finite-state automaton, given some metric of similarity. The algorithm can be adapted to use a variety of metrics for determining the distance between two words.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998